Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Making MKL-DNN default on MXNet master #13681

Merged
merged 116 commits into from
Jan 3, 2019
Merged

Conversation

mseth10
Copy link
Contributor

@mseth10 mseth10 commented Dec 19, 2018

Description

This PR continues work of PR #13464 and aims to make MKL-DNN default on MXNet master for Linux kernels with Intel/AMD processors..

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@mseth10
Copy link
Contributor Author

mseth10 commented Dec 26, 2018

@xinyu-intel No, we need not modify USE_MKLDNN flag in config.mk file. USE_MKLDNN=1 is set on select kernel/processors in Makefile and CMakefile.

Copy link
Member

@TaoLv TaoLv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me now. Thanks for your effort @mseth10 . Seems there is a trouble with CI. Do you mind taking a look at the failure?

@pengzhao-intel
Copy link
Contributor

@mseth10 @azai91 could you rebase and pass the CI? This will be the final step before the merge:)
Thanks all of your efforts.

Copy link
Contributor

@pengzhao-intel pengzhao-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@azai91
Copy link
Contributor

azai91 commented Dec 29, 2018

mseth is out for the next two weeks so I am continuing his PR on #13744. @pengzhao-intel @TaoLv please review this version (all I did was retrigger as it fails on what I believe is a flaky test on windows-gpu stage. documented here #13743)

@pengzhao-intel
Copy link
Contributor

@azai91 could you let @mseth10 adding you into collaborator list so that we can continue work in this PR?

https://help.github.com/articles/inviting-collaborators-to-a-personal-repository/

@azai91
Copy link
Contributor

azai91 commented Dec 29, 2018

that would be ideal. would be a lot cleaner to just retrigger on this PR compared with starting a new one just for one additional commit. however, seth is out in india without his laptop until 1/11. I tried to find ways to push directly onto his repo / retrigger from the jenkins GUI with no avail. @sandeep-krishnamurthy, since you are a committer, do you have permission to retrigger the last commit on this PR from the jenkins GUI?

@mseth10
Copy link
Contributor Author

mseth10 commented Dec 29, 2018

@azai91 I have added you as a collaborator to my mxnet repo. You should be able to re-trigger now. Sorry for the delayed reaponse.
FYI, commiters have the power to re-trigger a particular CI stage (e.g. windows gpu) from GUI. That would be a lot faster. Can you please re-trigger that failed stage @nswamy @sandeep-krishnamurthy ?

@TaoLv
Copy link
Member

TaoLv commented Jan 2, 2019

@mseth10 @azai91 Thank you. Now this PR looks good to me. Before merging, I want to double check with @szha , @marcoabreu if we need adjust the night build releasing process for this change? I assume that pip install mxnet==1.5.0b20190103 will have USE_MKLDNN=0 while pip install mxnet-mkl==1.5.0b20190103 will have USE_MKLDNN=1.

https://pypi.org/project/mxnet/#history
https://pypi.org/project/mxnet-mkl/#history

@azai91
Copy link
Contributor

azai91 commented Jan 2, 2019

it is explicitly turned off (https://www.dropbox.com/s/fk2jipmiivfjog0/Screenshot%202019-01-02%2010.11.43.png?dl=0).

@pengzhao-intel
Copy link
Contributor

Thanks for the info @azai91 . I think the PR is good to merge.

@TaoLv @sandeep-krishnamurthy could you help merge the code?

@TaoLv
Copy link
Member

TaoLv commented Jan 3, 2019

@azai91 Thank you for the information.

@TaoLv TaoLv merged commit 855f8b9 into apache:master Jan 3, 2019
@pengzhao-intel
Copy link
Contributor

Add next step in here #12591 (comment)

@@ -20,7 +20,7 @@ mxnet_option(USE_F16C "Build with x86 F16C instruction support" ON)
mxnet_option(USE_LAPACK "Build with lapack support" ON)
mxnet_option(USE_MKL_IF_AVAILABLE "Use MKL if found" ON)
mxnet_option(USE_MKLML_MKL "Use MKLDNN variant of MKL (if MKL found)" ON IF USE_MKL_IF_AVAILABLE AND (NOT APPLE))
mxnet_option(USE_MKLDNN "Use MKLDNN variant of MKL (if MKL found)" ON IF USE_MKL_IF_AVAILABLE AND (NOT APPLE))
mxnet_option(USE_MKLDNN "Use MKLDNN variant of MKL (if MKL found)" ON IF USE_MKL_IF_AVAILABLE AND (NOT APPLE) AND (NOT MSVC) AND (CMAKE_SYSTEM_PROCESSOR MATCHES x86_64))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually CMAKE_SYSTEM_PROCESSOR will not work for cross compilation. You could reuse a variable CMAKE_CROSSCOMPILING for this as shown here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you suggest we check for
SYSTEM_ARCHITECTURE STREQUAL "x86_64" AND NOT CMAKE_CROSSCOMPILING
instead of
CMAKE_SYSTEM_PROCESSOR MATCHES x86_64?

Also, will CMAKE_HOST_SYSTEM_PROCESSOR MATCHES x86_64 help in case of cross compilation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think SYSTEM_ARCHITECTURE STREQUAL "x86_64" AND NOT CMAKE_CROSSCOMPILING should work. But you need to add the trick for CMAKE_CROSSCOMPILING from here as well:

# workaround to store CMAKE_CROSSCOMPILING because is getting reset by the project command
if(CMAKE_CROSSCOMPILING)
  set(__CMAKE_CROSSCOMPILING ${CMAKE_CROSSCOMPILING})
  set(__CMAKE_CROSSCOMPILING_OVERRIDE ON)
endif()

project(mxnet C CXX)

 if(__CMAKE_CROSSCOMPILING_OVERRIDE)
  set(CMAKE_CROSSCOMPILING ${__CMAKE_CROSSCOMPILING})
endif()

Copy link
Contributor Author

@mseth10 mseth10 Jan 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lebeg I checked, CMAKE_SYSTEM_PROCESSOR works for cross compilation. CMAKE_CROSSCOMPILING is TRUE only for ARM v6, v7, v8, and for all these cases CMAKE_SYSTEM_PROCESSOR exists and CMAKE_SYSTEM_PROCESSOR MATCHES x86_64 returns FALSE. Hence, I don't think any change is needed. Please correct me if I am missing anything.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the semantics of mxnet_option. If the condition is not satisfied, this option will be turned off ,not setting the default value off.

@aaronmarkham
Copy link
Contributor

Does this update have anything to do with this nightly test failure (cmake + mkldnn)?

rondogency pushed a commit to rondogency/incubator-mxnet that referenced this pull request Jan 9, 2019
* mkldnn is default makefile and explicitly turn off for buidls

* add endif

* retrigger

* retrigger

* build mkldnn as static lib

* update makefile to statically build mkldnn

* build static mkldnn

* fix static name

* fix static name

* update static for mac

* rename mkldnn dep in ci

* remove moving mkldnn dynamic lib

* retrigger

* remove commented code

* retrigger

* remove mkldnn dnaymic for unitest

* retrigger

* retrigger

* force static for mkldnn lib

* turn of mkldnn on arm builds

* remove dynamic mkldnn bind

* update jenkins to use only mkldnn

* remove last flag

* turn mkldnn by default on mac

* move mkldnn files for GPU MKLDNN build

* copy lib mxnet in gpu build

* only link windows

* add mkldnn.mk

* try force linking

* retrigger

* retrigger

* remove mkldnn dynanmic check

* use ifndef

* remove test mkldnn install

* fix spacing

* fix index

* remove cp of mkldnn since statically linked

* add libmkldnn.a to list of files to pack

* include mkl_ml

* add mkldnn to pack

* add libiomp to ci pack

* move static libs

* fix typo

* pack mkldnn

* retrigger

* add linux artifacts

* move libmkldnn in gpu cmake build

* move libmkldnn and libiomp5 on gpu workspace

* move linked files

* fix typo

* fix typo

* add artifacts for tensorrt

* move mkldnn lib in scala build

* move mkldnn lib on cpu scala

* create dir for binding

* rename libmkldnn in scala

* move mklml dep in scala builds

* move mkl to another linked folder

* move libmkl to another dir

* add libmklml

* move mkldnn

* move mkldnn on centos

* specify new dynamic path

* retrigger

* remove mkldnn dynamic lib

* remove moving mkldnn artifact

* add ld path

* retrigger

* Revert "remove moving mkldnn artifact"

This reverts commit 16cca19.

* Revert "remove mkldnn dynamic lib"

This reverts commit d510436.

* update makefile

* Revert RPATH change and trigger CI

* correcting use-mkldnn flags for two tests

* mkldnn default on linux for starters

* reverting naming rules of pack_lib

* adding mkldnn=0 flags to centos non mkldnn builds

* adding mkldnn=0 flags to ubuntu gpu non mkldnn builds

* removing mkldnn binary operation for ubuntu gpu cmake non mkldnn build

* removing mkldnn binary operation for centos non-mkldnn unittest

* adding explicit USE_MKLDNN=0 flags for clang builds

* adding explicit USE_MKLDNN=0 flags for cpu ubuntu builds

* removing mkldnn binaries from non mkldnn builds scala gpu

* adding explicit flag mkldnn=0 for tensorrt gpu build

* adding explicit flag mkldnn=0 for ubuntu cmake asan

* adding centos cpu mkldnn tests to CI

* adding CentOS GPU MKLDNN build and unittest

* not keeping mkldnn default for mac os

* setting mkldnn default for x86_64 only

* running docs with mkldnn=0 flag

* removing CentOS CPU Scala MKLDNN test

* setting mkldnn default for x86_64 only

* not making mkldn default on windows

* removing Centos MKLDNN tests from CI

* retrigger

* retrigger

* retrigger
haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019
* mkldnn is default makefile and explicitly turn off for buidls

* add endif

* retrigger

* retrigger

* build mkldnn as static lib

* update makefile to statically build mkldnn

* build static mkldnn

* fix static name

* fix static name

* update static for mac

* rename mkldnn dep in ci

* remove moving mkldnn dynamic lib

* retrigger

* remove commented code

* retrigger

* remove mkldnn dnaymic for unitest

* retrigger

* retrigger

* force static for mkldnn lib

* turn of mkldnn on arm builds

* remove dynamic mkldnn bind

* update jenkins to use only mkldnn

* remove last flag

* turn mkldnn by default on mac

* move mkldnn files for GPU MKLDNN build

* copy lib mxnet in gpu build

* only link windows

* add mkldnn.mk

* try force linking

* retrigger

* retrigger

* remove mkldnn dynanmic check

* use ifndef

* remove test mkldnn install

* fix spacing

* fix index

* remove cp of mkldnn since statically linked

* add libmkldnn.a to list of files to pack

* include mkl_ml

* add mkldnn to pack

* add libiomp to ci pack

* move static libs

* fix typo

* pack mkldnn

* retrigger

* add linux artifacts

* move libmkldnn in gpu cmake build

* move libmkldnn and libiomp5 on gpu workspace

* move linked files

* fix typo

* fix typo

* add artifacts for tensorrt

* move mkldnn lib in scala build

* move mkldnn lib on cpu scala

* create dir for binding

* rename libmkldnn in scala

* move mklml dep in scala builds

* move mkl to another linked folder

* move libmkl to another dir

* add libmklml

* move mkldnn

* move mkldnn on centos

* specify new dynamic path

* retrigger

* remove mkldnn dynamic lib

* remove moving mkldnn artifact

* add ld path

* retrigger

* Revert "remove moving mkldnn artifact"

This reverts commit 16cca19.

* Revert "remove mkldnn dynamic lib"

This reverts commit d510436.

* update makefile

* Revert RPATH change and trigger CI

* correcting use-mkldnn flags for two tests

* mkldnn default on linux for starters

* reverting naming rules of pack_lib

* adding mkldnn=0 flags to centos non mkldnn builds

* adding mkldnn=0 flags to ubuntu gpu non mkldnn builds

* removing mkldnn binary operation for ubuntu gpu cmake non mkldnn build

* removing mkldnn binary operation for centos non-mkldnn unittest

* adding explicit USE_MKLDNN=0 flags for clang builds

* adding explicit USE_MKLDNN=0 flags for cpu ubuntu builds

* removing mkldnn binaries from non mkldnn builds scala gpu

* adding explicit flag mkldnn=0 for tensorrt gpu build

* adding explicit flag mkldnn=0 for ubuntu cmake asan

* adding centos cpu mkldnn tests to CI

* adding CentOS GPU MKLDNN build and unittest

* not keeping mkldnn default for mac os

* setting mkldnn default for x86_64 only

* running docs with mkldnn=0 flag

* removing CentOS CPU Scala MKLDNN test

* setting mkldnn default for x86_64 only

* not making mkldn default on windows

* removing Centos MKLDNN tests from CI

* retrigger

* retrigger

* retrigger
@mseth10 mseth10 deleted the mkldnn-default branch June 1, 2020 10:46
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.